Class-based Aggressive Feature Selection for Polynomial Networks Text Classifiers – an Empirical Study
نویسندگان
چکیده
Feature Selection (FS) is a crucial preprocessing step in Text Classification (TC) systems. FS can be either Class-Based or Corpus-Based. Polynomial Network (PN) classifiers have proved recently to be competitive in TC using a very small subset of corpora features. This paper presents an empirical study of the performance of PN classifiers using Aggressive Class-Based FS. Seven of the stateof-the art FS metrics are experimented and compared: Chi Square (CHI), Information Gain (IG), Odds Ratio (OR), GSS, NGL coefficient, Document Frequency (DF), and Gain Ratio (GR).The study is conducted on the Reuters Benchmark Corpus. Experimental results are presented in terms of both micro-averaged and macro-averaged precision, recall and F measures. Results reveal that aggressive Class-Based Chi-Square and DF metrics work best for Reuters using PN classifiers compared to the other five FS metrics experimented in this research.
منابع مشابه
A Novel One Sided Feature Selection Method for Imbalanced Text Classification
The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...
متن کاملPolynomial Neural Networks versus Other Arabic Text Classifiers
Many Text Classification (TC) algorithms have been proposed for Arabic TC. Polynomial Neural Networks (PNNs) were used recently in English TC, and have proved to be competitive to the state of the art text classifiers in this field. Lately, they were proposed for classifying Arabic documents. In this research paper, an experimental study that directly compares PNNs against five famous classific...
متن کاملFeature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets
Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...
متن کاملSupport Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran
Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...
متن کاملEvaluation of Classifiers in Software Fault-Proneness Prediction
Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...
متن کامل